Speaker-dependent audio-visual emotion recognition

نویسندگان

  • Sanaul Haq
  • Philip J. B. Jackson
چکیده

This paper explores the recognition of expressed emotion from speech and facial gestures for the speaker-dependent case. Experiments were performed on an English audio-visual emotional database consisting of 480 utterances from 4 English male actors in 7 emotions. A total of 106 audio and 240 visual features were extracted and features were selected with Plus l-Take Away r algorithm based on Bhattacharyya distance criterion. Linear transformation methods, principal component analysis (PCA) and linear discriminant analysis (LDA), were applied to the selected features and Gaussian classifiers were used for classification. The performance was higher for LDA features compared to PCA features. The visual features performed better than the audio features and overall performance improved for the audio-visual features. In case of 7 emotion classes, an average recognition rate of 56 % was achieved with the audio features, 95 % with the visual features and 98 % with the audio-visual features selected by Bhattacharyya distance and transformed by LDA. Grouping emotions into 4 classes, an average recognition rate of 69 % was achieved with the audio features, 98 % with the visual features and 98 % with the audio-visual features fused at decision level. The results were comparable to the measured human recognition rate with this multimodal data set. 1

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker-Dependent Human Emotion Recognition in Unimodal and Bimodal Scenarios

This paper presents an efficient technique of human emotion recognition using the audio and visual modalities for the speaker-dependent scenario. To achieve better emotion classification different audio and visual features were extracted. The feature selection was performed using the Plus -Take Away algorithm based on two criteria: Mahalanobis distance and KL-divergence. The feature selection w...

متن کامل

MEMN: Multimodal Emotional Memory Network for Emotion Recognition in Dyadic Conversational Videos

Multimodal emotion recognition is a developing field of research which aims at detecting emotions in videos. For conversational videos, current methods mostly ignore the role of inter-speaker dependency relations while classifying emotions. In this paper, we address recognizing utterance-level emotions in dyadic conversations. We propose a deep neural framework, termed Multimodal Emotional Memo...

متن کامل

Cultural similarities and differences in the recognition of audio-visual speech stimuli

Cultural similarities and differences were compared between Japanese and North American subjects in the recognition of emotion. Seven native Japanese and five native North Americans (four Americans and one Canadian) subjects participated in the experiments. The materials were five meaningful words or shortsentences in Japanese and American English. Japanese and American actors made vocal and fa...

متن کامل

Emotion recognition using linear transformations in combination with video

The paper discuses the usage of linear transformations of Hidden Markov Models, normally employed for speaker and environment adaptation, as a way of extracting the emotional components from the speech. A constrained version of Maximum Likelihood Linear Regression (CMLLR) transformation is used as a feature for classification of normal or aroused emotional state. We present a procedure of incre...

متن کامل

An audio-visual approach to simultaneous-speaker speech recognition

Audio-visual speech recognition is an area with great potential to help solve challenging problems in speech processing. Difficulties due to background noises are significantly reduced by the additional information provided by extra visual features. The presence of additional speech from other talkers during recording may be viewed as one of the most difficult sources of noise. This paper prese...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009